v4.1.x: Unlink and rebind socket when session directory already exists #9266
Merged
jsquyres merged 1 commit intoopen-mpi:v4.1.xfrom Aug 29, 2021
Merged
Conversation
gpaulsen
approved these changes
Aug 17, 2021
The session directory created during the mpi process execution sometimes will be left without cleanup even after the process terminates, this scenario mostly happens when orte daemon is SIGKILL'd. This ensures the smooth socket binding, by unlinking the exisiting socket file (if any exists in the session_directory) and rebinding it, thus avoiding bind() failure due to unclean session directories. Signed-off-by: Austen Lauria <awlauria@us.ibm.com> (cherry picked from commit e228a1c)
5aa22fa to
1e6b91d
Compare
rhc54
approved these changes
Aug 29, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The session directory created during the mpi process execution
sometimes will be left without cleanup even after the process
terminates, this scenario mostly happens when orte daemon is SIGKILL'd.
This ensures the smooth socket binding, by unlinking the exisiting
socket file (if any exists in the session_directory) and rebinding it, thus
avoiding bind() failure due to unclean session directories.
Signed-off-by: Austen Lauria awlauria@us.ibm.com
(cherry picked from commit e228a1c)